[SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)#15093
[SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)#15093HyukjinKwon wants to merge 6 commits intoapache:masterfrom
Conversation
|
@srowen Could you please review this? |
| gamma: Double): CoordinateMatrix = { | ||
| require(gamma > 1.0, s"Oversampling should be greater than 1: $gamma") | ||
| require(colMags.size == this.numCols(), "Number of magnitudes didn't match column dimension") | ||
| require(colMags.length == this.numCols(), "Number of magnitudes didn't match column dimension") |
There was a problem hiding this comment.
This is different, yeah. You're just avoiding the implicit conversion. While that's sort of good I think it's separate and maybe something to change or not change everywhere at once.
There was a problem hiding this comment.
Sure! I will revert this change.
|
@srowen Please let me check this out again tomorrow. |
|
Test build #65366 has finished for PR 15093 at commit
|
|
Test build #65369 has finished for PR 15093 at commit
|
| var i = 0 | ||
| while (i < groupedWindowExpressions.size) { | ||
| val ((partitionSpec, orderSpec), windowExpressions) = groupedWindowExpressions(i) | ||
| groupedWindowExpressions.foreach { case ((partitionSpec, orderSpec), windowExpressions) => |
There was a problem hiding this comment.
Hi @srowen , It seems groupedWindowExpressions is Seq[((Seq[Expression], Seq[SortOrder]), ArrayBuffer[NamedExpression])] but the desired output is LogicalPlan. So, it seems I can't directly use fold or reduce simply. So, could this just use foreach rather than others? Otherwise, please teach me for a better expression.
There was a problem hiding this comment.
Well currentChild is a Window, and that's the only thing whose computation is changed here. How about:
val currentChild =
groupedWindowExpressions.foldLeft(child) {
case (last, ((partitionSpec, orderSpec), windowExpressions)) =>
Window(windowExpressions, partitionSpec, orderSpec, last)
}
There was a problem hiding this comment.
Ah, that's clear and better. I will try. Thanks!
|
Test build #65440 has finished for PR 15093 at commit
|
|
OK, looks good. Some of these are .size -> .length changes, but I suppose those are related and positive. It avoids an extra method invocation, which might matter in a very tight loop. |
| val currentChild = | ||
| groupedWindowExpressions.foldLeft(child) { | ||
| case (last, ((partitionSpec, orderSpec), windowExpressions)) => | ||
| // Set currentChild to the newly created Window operator. |
There was a problem hiding this comment.
Sorry, one last thing, can you remove this comment? and update the comment above and below this expression? they're no longer applicable. Actually currentChild isn't a great name now anyway. eh, windowOps or something?
There was a problem hiding this comment.
Ah, yes, I should've checked this closely. I will update soon.
|
Test build #65535 has finished for PR 15093 at commit
|
|
retest this please |
|
Test build #65534 has finished for PR 15093 at commit
|
|
Merging to master, and to 2.0 to match the parent JIRA, even though this is probably less important, yet still a clean small win |
…th/size which is O(n) This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked the Scala source. `length` in [LinearSeqOptimized.scala#L49-L57](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/LinearSeqOptimized.scala#L49-L57) is O(n). Also, `size` calls `length` via [SeqLike.scala#L106](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L106). For debugging, I have created these as below: ```scala ArrayBuffer(1, 2, 3) Array(1, 2, 3) List(1, 2, 3) Seq(1, 2, 3) ``` and then called `size` and `length` for each to debug. I ran the bash as below on Mac ```bash find . -name *.scala -type f -exec grep -il "while (.*\\.length)" {} \; | grep "src/main" find . -name *.scala -type f -exec grep -il "while (.*\\.size)" {} \; | grep "src/main" ``` and then checked each. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15093 from HyukjinKwon/SPARK-17480-followup. (cherry picked from commit 86c2d39) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Test build #65536 has finished for PR 15093 at commit
|
|
@HyukjinKwon @srowen I am not sure how exactly backporting a patch led to an empty file, but this does not seem right. I am reverting this commit in branch 2.0. Please make a new PR to fix this in branch 2.0 correctly. |
|
Oh weird! no idea why that happened. Yeah I'll take care of it from here. |
|
I reverted already! |
…th/size which is O(n) This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked the Scala source. `length` in [LinearSeqOptimized.scala#L49-L57](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/LinearSeqOptimized.scala#L49-L57) is O(n). Also, `size` calls `length` via [SeqLike.scala#L106](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L106). For debugging, I have created these as below: ```scala ArrayBuffer(1, 2, 3) Array(1, 2, 3) List(1, 2, 3) Seq(1, 2, 3) ``` and then called `size` and `length` for each to debug. I ran the bash as below on Mac ```bash find . -name *.scala -type f -exec grep -il "while (.*\\.length)" {} \; | grep "src/main" find . -name *.scala -type f -exec grep -il "while (.*\\.size)" {} \; | grep "src/main" ``` and then checked each. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15093 from HyukjinKwon/SPARK-17480-followup. (cherry picked from commit 86c2d39) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Yep, saw that. I re-merged this, and yes during conflict resolution QuantileSummaries.scala comes up as a file added only in the master branch, but when I choose to not take the change in the IDE, I see it actually resulted in adding an empty file. I made sure that was not part of the commit and pushed again. Looks as intended now: 5fd354b |
…th/size which is O(n) ## What changes were proposed in this pull request? This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked the Scala source. `length` in [LinearSeqOptimized.scala#L49-L57](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/LinearSeqOptimized.scala#L49-L57) is O(n). Also, `size` calls `length` via [SeqLike.scala#L106](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L106). For debugging, I have created these as below: ```scala ArrayBuffer(1, 2, 3) Array(1, 2, 3) List(1, 2, 3) Seq(1, 2, 3) ``` and then called `size` and `length` for each to debug. ## How was this patch tested? I ran the bash as below on Mac ```bash find . -name *.scala -type f -exec grep -il "while (.*\\.length)" {} \; | grep "src/main" find . -name *.scala -type f -exec grep -il "while (.*\\.size)" {} \; | grep "src/main" ``` and then checked each. Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15093 from HyukjinKwon/SPARK-17480-followup.
What changes were proposed in this pull request?
This PR fixes all the instances which was fixed in the previous PR.
To make sure, I manually debugged and also checked the Scala source.
lengthin LinearSeqOptimized.scala#L49-L57 is O(n). Also,sizecallslengthvia SeqLike.scala#L106.For debugging, I have created these as below:
and then called
sizeandlengthfor each to debug.How was this patch tested?
I ran the bash as below on Mac
and then checked each.